Context Sensitive Word Deletion Model for Statistical Machine Translation
نویسندگان
چکیده
Word deletion (WD) errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation (SMT), and have a critical impact on the adequacy of the translation results generated by SMT systems. In this paper, first we classify the word deletion into two categories, wanted and unwanted word deletions. For these two kinds of word deletions, we propose a maximum entropy based word deletion model to improve the translation quality in phrase-based SMT. Our proposed model are based on features automatically learned from a real-word bitext. In our experiments on Chinese-to-English news and web translation tasks, the results show that our approach is capable of generating more adequate translations compared with the baseline system, and our proposed word deletion model yields a +0.99 BLEU improvement and a -2.20 TER reduction on the NIST machine translation evaluation corpora.
منابع مشابه
ارائه یک رتبهبند برای خطایاب معنایی با استفاده از ویژگیهای حساس به متن
Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...
متن کاملDiscourse-aware Statistical Machine Translation as a Context-sensitive Spell Checker
Real-word errors or context sensitive spelling errors, are misspelled words that have been wrongly converted into another word of vocabulary. One way to detect and correct real-word errors is using Statistical Machine Translation (SMT), which translates a text containing some real-word errors into a correct text of the same language. In this paper, we improve the results of mentioned SMT system...
متن کاملContext-Dependent Phrasal Translation Lexicons for Statistical Machine Translation
Most current statistical machine translation (SMT) systems make very little use of contextual information to select a translation candidate for a given input language phrase. However, despite evidence that rich context features are useful in stand-alone translation disambiguation tasks, recent studies reported that incorporating context-rich approaches from Word Sense Disambiguation (WSD) metho...
متن کاملStatistical Machine Translation with Local Language Models
Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models ...
متن کاملWord Sense Disambiguation for Statistical Machine Translation
While much effort has been put in designing and evaluating Word Sense Disambiguation (WSD) models for translation in the WSD community, standard Statistical Machine Translation (SMT) systems have achieved remarkable improvements in translation quality without modeling WSD explicitly. However, inspecting SMT output suggests that SMT needs better semantic modeling to accurately translate meaning....
متن کامل